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ABSTRACT 


This  was  a  study  of  the  effectB  of  the  occurrence  of 
an  unusual  performance  and  of  time-order  on  the  Judgment 
of  a  sequence  of  performances. 

Silent,  color  movies  were  made  of  six  male  operators 
performing  a  simple  r eac t i on- t i me  task.  The  operators 
had  been  thoroughly  practiced  until  they  could  deliberately 
manipulate  their  mean  reaction  times  (MRT).  Three  oper¬ 
ators  were  used  as  "anchoring  performers"  to  illustrate 
the  top,  bottom,  and  middle  performance  levels  of  a  rating 
scale.  The  other  three,  operators  A,  B,  and  C,  each 
performed  five  1-minute  trials.  A  and  C  produced  relatively 
constant  performance  levels  (MRT  =  1.46  sec.).  Operator  B, 
however,  had  one  unusually  "good"  trial  or  he  had  one 
unusually  "poor"  trial.  The  four  remaining  trials  were 
such  that  his  overall  MRT  was  also  1.46  seconds. 

Six  groups  of  raterB  (total  N  =  239)  viewed  the  movie. 
They  saw  the  three  anchoring  performers,  then  separately 
rated  A  B,  and  C  on  their  overall  performance.  The  movie 
was  edited  so  that  Group  1  saw  operator  B  perform  well  on 
the  first  trial,  Group  II  —  on  the  third  trial,  and  Group 
III — on  the  fifth  trial.  Group  IV  saw  B  perform  poorly 
on  the  first  trial,  Group  V — on  the  third  trial,  and 
Group  IV — on  the  fifth  trial. 

The  results  showed  that  operator  B's  performance  war* 
rated  in  the  following  manner:  Group  I  Group  II  <  CiOup 

III,  when  he  had  an  unusually  good  trial;  and  Group  IV 
Group  V  -  Group  VI,  when  he  had  an  unusually  poor  trial. 
These  results  Indicated  that  an  unusually  good  performance 
was  overly  weighted  in  the  final  rating  when  that  perform¬ 
ance  occurred  on  the  first  trial  or  on  the  last  trial, 
while  an  unusually  poor  performance  was  overly  weighted 
only  when  it  occurred  on  the  first  trial.  The  results 
also  showed  that  the  judges  gave  significantly  different 
mean  ratings  to  the  three  different  operators  in  spite  of 
the  fact  that  their  performances  were  objectively  equiva¬ 
lent.  Operator  C,  the  last  man  rated,  was  given  a  lower 
rating  than  either  operator  A  or  B. 

It  was  concluded  that  "first  impressions"  of  a  worker 
being  rated  (and  in  some  Instances  "last  impressions") 
can  significantly  bias  a  performance  Judgment  and  produce 
invalid  ratings. 
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THE  INFLUENCE  OF  UNUSUAL  PERFORMANCES  AND 
TIME-ORDER  ON  PERFORMANCE  JUDGMENT 


Tint-order  effects  have  been  observed  in  a  variety  of  experi¬ 
ments  on  learning  (e.g. ,  Brown  &  Overall,  1950)  and  in  studies  of 
attitude  and  opinion  formation  (e.g.,  Luchins,  1960;  Miller  & 
Campbell.  1959);  but  time-order  has  never  been  demonstrated  to 
affect  the  Judgment  of  human  performance.  A  "time-order  effect," 
in  this  context,  means  that  a  Judgment  of  a  stimulus  is  affected  by 
the  ordinal  position  of  the  stimulus  in  the  series  to  be  Judged. 

The  study  reported  here  was  concerned  with  the  effects  of  time- 
order  on  performance  ratings. 


INTRODUCTION 

When  performance  ratings  are  made  at  periodic  Intervals,  it  is 
ordinarily  intended  that  the  ratings  reflect  the  worker's  average 
performance  during  the  interval  between  ratings.  If  time-order 
effects  are  present,  however,  certain  individual  performances  will 
be  disproportionately  weighted  in  the  total  rating,  and  as  a  result, 
the  total  rating  will  be  invalid.  The  situation  of  concern  to  the 
present  study  is  one  in  which  a  rater  must  evaluate  the  overall 
performance  level  of  an  operator,  having  seen"  riim  f*rIor»  «.  task  on 
a  number  of  occasions.  The  question  of  Interest  is  whether  or  not 
the  operator's  initial  performance  on  the  task  is  given  a  dispro¬ 
portionate  weight  by  the  rater  in  determining  his  overall  evalu¬ 
ation;  or  conversely,  whether  or  not  the  most  recent  (closest  to 
the  time  at  which  the  rating  is  made)  performance  is  given  a  dis¬ 
proportionate  weight;  or  alternatively,  whether  or  not  both  initial 
and  recent  performances  are  given  greater  or  less  weight  than  per¬ 
formances  in  the  middle  of  the  series. 

Other  questions  of  practical  Interest  involve  the  effect  of 
unusual  performances  in  the  series  on  the  rating  of  the  total  series. 
Do  raters  give  greater  weight  to  unusual  (exceptionally  good  or 
exceptionally  poor)  performances  than  they  give  to  the  more  typical 


performances?  Or  are  auch  unusual  performances  "discounted"  as 
flukes  and  given  less  weight  than  they  deserve? 

These  various  effects  have  been  observed  In  psychophysical 
research  and  have  been  given  names.  Although  there  may  be  no  pre¬ 
cise  parallel  between  psychophysics  and  performance  Judgment,  these 
traditions  names  will  be  used  here  for  convenience.  The  effects 
under  Si.jv  in  the  research  reported  here  were  the  following: 


Primacy  effect:  Events  occurring  early  In  a 

sequence  are  given  greater  weight  than  those 
occurring  late  in  the  sequence  In  determining 
a  judgment  of  (or  response  to)  the  entire 
s  eq  uenc  e . 

Recency  effect :  Events  occurring  close  to  the 

time  at  which  the  Judgment  Is  made  are  given 
greater  weight  than  mure  remote  events  In  deter¬ 
mining  the  overall  Judgment. 

Terminal  effect:  Events  occurring  at  the  begin¬ 

ning  or  end  of  a  sequence  are  given  greater  or 
less  weight  than  events  occurring  in  the  middle 
of  a  sequence  in  determining  a  judgment  of  the 
entire  sequence. 

Contrast  effect :  Unusual  events  In  a  sequence 

are  given  a  disproportionately  high  weight  com¬ 
pared  with  the  more  typical  events  In  determining 
a  Judgment  of  the  entire  sequence. 

Assimilation  effect:  Unusual  events  in  a  sequence 
are  given  a  disproportionately  low  weight  compared 
with  the  more  typical  events  in  determining  a  Judgment 
of  the  entire  sequence. 


Because  of  the  complexity  of  the  present  experiment,  it  is  dif¬ 
ficult  to  describe  the  experimental  hypotheses  without  describing 
the  method  by  which  they  were  tested.  Therefore,  a  statement  of 
hypotheses  will  be  postponed  until  the  experimental  method  has  been 
described . 
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METHOD 


The  Experimental  Task 

To  study  time-order  effects  on  per¬ 
formance  judgment,  it  is  first  necessary  to 
devise  or  select  a  task  to  be  performed.  A 
desirable  task  for  research  purposes  would 
have  the  following  characteristics.  (1)  Th 
criterion  of  performance  on  the  task  would 
be  unambiguous  and  easily  understood  by  the 
Judges.  (2)  Rating  performance  on  the  task 
would  be  difficult  enough  to  produce  dif¬ 
ferences  in  the  ratings  made  by  different 
judges,  but  not  so  difficult  that  it  would 
produce  unreliable  ratings.  (3)  To  maintai 
motivation  on  the  part  of  the  judges,  the 
task  would  require  the  judges  to  be  directl 
involved,  perhaps  even  competing  with  the 
operator  whose  performance  is  to  be  rated. 
(4)  And,  most  important,  an  objective 
measure  of  the  criterion  would  be  available 
to  the  experimenter.  An  experimental  task 
was  devised  to  meet  these  requirements. 

The  experimental  task  (Figure  1)  in¬ 
volved  detecting  and  responding  to  signals 
that  occurred  at  unpredictable  times  and 
locations.  The  operator  stood  in  front  of 
a  4'  x  4'  panel  of  25  lights.  At  irregular 
intervals  a  light  (any  of  the  25)  went  out. 
The  operator's  task  was  to  detect  the  ex¬ 
tinguished  light,  the  "signal,"  and  re¬ 
light  it  as  quickly  as  possible  by  turning 
a  switch  located  next  to  the  light.  A 
clock  measured  the  elapsed  time  between  the 
nd  the  turning  of  the  proper 


e  panel  a 


snitch  by  th«  operator.  The  sole  criterion  of  performance  on  the 
task  was  the  mean  reaction  time  (MRT)  to  all  signals  within  a  given 
trial  period. 

Preliminary  Study  of  Performance  on  the  Experimental  Task 

To  obtain  an  indication  of  characteristic  performance  on  the 
experimental  task,  20  subjects  were  tested,  each  performing  the  task 
on  five  1-minute  trials.  During  each  trial  10  signals  were  presented 
at  random  intervals  and  random  locations.  The  results  showed  that 
the  MRT  for  all  subjects  on  all  trials  was  1.46  seconds.  The  range 
of  individual  scores  was  from  0.74  seconds  for  the  fastest  subject 
to  2.25  seconds  for  the  slowest  subject.  These  performance  scores 
were  used  as  guidelines  in  devising  the  stimulus  materials  for  the 
experiment . 

Stimulus  Materials 

In  devising  a  means  whereby  groups  of  Judges  could  observe 
operators  performing  the  task,  two  requirements  were  essential. 

First,  all  Judges  must  be  able  to  observe  precisely  the  same  per¬ 
formances;  second,  the  same  set  of  performances  must  be  presented 
to  each  Judge,  but  the  order  of  performances  within  the  set  must  be 
re-arranged  for  different  groups  of  Judges.  These  requirements 
could  be  met  adequately  by  motion  pictures. 

Silent,  color,  16  mm.,  motion  pictures  were  made  of  six  oper¬ 
ators  separately  performing  the  experimental  task.  The  operators 
were  all  males  of  similar  age  and  general  appearance.  Each  oper¬ 
ator  wore  a  white  laboratory  coat  and  was  positioned  before  the  panel 
so  that  the  camera  viewed  the  scene  from  over  his  right  shoulder. 

The  motion  picture  film  was  purposely  underexposed  so  that  the 
operator  appeared  to  be  working  in  a  slightly  darkened  room.  The 
result  was  that  the  judges  could  clearly  see  that  the  operators 
were  indeed  different  men,  but  were  unable  to  distinguish  any 
marked  physical  differences  among  them. 
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The  operators  were  thoroughly  practiced  on  the  task  until  they 
could  deliberately  manipulate  their  performance  scores  with  the  aid 
of  verbal  alerting  and  feedback.  Further  manipulation  of  each  oper¬ 
ator's  performance  was  achieved  by  editing  the  motion  picture  appro¬ 
priately,  so  that  it  was  possible  to  produce  motion  pictures  of 
operators  whose  performances  (MRT's)  were  known  precisely,  and  had, 
In  fact,  been  predetermined. 

Anchoring  Performances  and  Rating  Scale 

Three  of  the  six  operators  were  used  as  "anchoring  performers." 
Their  role  was  to  provide  standard  performances  by  which  the  Judges 
could  rate  the  performances  of  the  other  three  operators  on  a  25- 
point  rating  scale  The  rating  scale  (Figure  2)  consisted  of  a  re¬ 
presentation  of  25  operators  arranged  in  a  hierarchy  diagonally 
across  the  page  The  figure  at  the  top  of  the  rating  scale  was 
labeled  "This  man  is  the  fastest  we  know  of."  The  middle  figure 
(13th  from  the  top)  was  labeled  "This  man  is  average."  The  figure 
at  the  bottom  of  the  rating  scale  was  labeled  "This  man  is  the 
slowest  we  know  of  "  One-minute  motion  pictures  were  produced  of 
each  of  the  three  anchoring  performers.  One  performed  at  MRT  =  0.50 
seconds  and  represented  the  top  of  the  scale;  another  performed  at 
MRT  -  1.46  seconds  and  represented  the  middle  of  the  scale,  and  the 
third  performed  at  MRT  =  2.75  seconds  and  represented  the  bottom  of 
the  scale . 

Performances  to  be  Rated 

The  remaining  three  operators  were  those  whose  performances 
were  to  be  rated.  Motion  pictures  were  produced  of  each  operator 
performing  the  task  on  five  1-mlnute  trials,  with  approximately  a 
20-second  rest  (blank  screen)  between  trials.  In  each  trial  seven 
signals  occurred  at  random  intervals  and  locations.  Two  of  these 
operators  (hereafter  to  be  called  A  and  C)  served  as  controls. 

Their  performances  varied  unsystematically  about  a  MRT  of  1.46 
seconds  so  that  the  performance  curves,  when  plotted  by  trials, 
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THIS  MAN  IS  THE  FASTEST  VE  KNOW  OF 


THIS  MAN  IS  THE  SLOWEST  WE  KNOW  OF 
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gave  the  appearance  of  flat  functions  (Figure  3). 


Anchoring 
Perf  oriancea 


Figure  3.  Performances  of  the  three  anchoring  performers 
and  the  two  control  operators,  A  and  £. 


Experimental  variations  in  this  study  concerned  the  performance 
of  the  remaining  operator  (hereafter  to  be  called  JJ)  .  Two  motion 
pictures  were  produced  of  B  performing  the  task  on  five  1-minute 
trials.  In  one  motion  picture,  B  performed  unusually  well  on  one 
trial  (MRT  ~  0.71),  but  his  performance  on  the  other  four  trials  was 
such  that  his  total  MRT  =  1.46.  In  the  other  motion  picture,  £  per¬ 
formed  unusually  poorly  on  one  trial  (MRT  =  2.27*),  but  his  perform¬ 
ance  on  the  other  four  trials  was  such  that  his  total  MRT  =  1.46. 
Both  motion  pictures  were  edited  so  that  the  unusual  trial  occurred 
either  first,  third,  or  fifth  in  the  sequence  of  five  trials.  The 

♦To  make  the  performance  credible  to  the  Judges,  J3  did  not  respond 
slowly  to  all  seven  signals;  but  rather  briefly  "overlooked”  four 
of  the  seven  signals.  The  actual  response  times  to  the  seven 
signals  were  1.2,  32,  1.2,  1.4,  3.3,  2.8,  and  2.8  seconds. 
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Figure  4.  Tbe  eix  different  performance  sequences  of  operator 


six  different  performance  sequence!  that  resulted  from  this  manipu¬ 
lation  are  shown  in  Figure  4. 

Judges 

The  Judges  were  322  Navy  enlisted  men,  trainees  at  the  U9N 
Fleet  Radio  School  in  San  Diego,  California.  Of  thlB  number,  239 
Judges  took  part  in  the  main  experiment,  and  83  were  used  in  a 
secondary  experiment.  The  Judges  in  the  main  experiment  were 
divided  into  six  groups  (Ns  -  42,  36,  43,  43,  39,  and  36)  and 

rated  the  performances  of  A,  B,  and  C  using  the  following  procedure. 


Procedure 


Each  group  of  judgeB  was  assembled  in  a  classroom  furnished 
with  blackout  drapes.  They  were  given  these  instructions: 

"This  is  a  study  of  how  well  you  can  Judge 
the  performances  of  operators  on  a  certain  task. 

The  task  requires  a  high  degree  of  alertness  on 
the  part  of  the  operator,  and  so  you  also  must  be 
alert  if  you  are  to  Judge  his  performance  accurately. 

"You  will  view  a  series  of  motion  pictures  of 
operators  performing  a  simple  reaction-time  task. 

The  operator  will  stand  in  front  of  a  large  panel 
of  25  lights.  Occasionally  a  light  will  go  out  and 
the  operator  must  turn  it  back  on  a£  quickly  as 
PQ8albl e  by  turning  a  switch  next  to  the  light. 

Your  judgment  of  his  performance  will  be  based  on 
the  average  speed  with  which  he  detects  extinguished 
lights  and  turns  them  back  on.  You  will  indicate 
your  judgment  of  his  speed  by  marking  one  of  these 
rating  sheets.  The  rating  sheet  consists  of  a  series 
of  men  arranged  from  the  fastest  to  the  slowest.  The 
man  in  the  middle  of  the  scale  Indicates  the  average 
performer  on  this  task.  We  will  now  show  you  what 
the  task  is  like,  and  then  show  you  performances  that 
represent  the  middle,  top,  and  bottom  points  on  the 
rating  scale. " 


At  this  point  the  judges  were  shown  an  introductory  motion 
picture  in  which  the  experimenter  was  shown  demonstrating  how  the 
task  was  to  be  performed.  Following  the  demonstration  film,  the 
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anchoring  performers  were  shown,  pracadad  by  tha  film  tltlaa,  "Aver- 
aga  operator, "  "Faataat  operator,"  and  "Slowaat  operator. "  Tha  nar¬ 
ratives  that  accoapaniad  tha  anchoring  performances  ware  the  fol¬ 
lowing: 


"You  will  now  sea  an  operator  whose  perform¬ 
ance  Is  exactly  average.  If  you  were  rating  hla 
parforaance  on  the  scale  you  would  mark  the  middle 
point  of  tha  scale." 

"You  will  now  sea  tha  fastest  operator  we  have 
aver  tested.  If  you  were  rating  his  performance  you 
would  mark  tha  vary  top  of  tha  scale." 

"Now  you  will  see  the  slowest  operator  we  have 
ever  tested.  If  you  were  rating  his  performance  you 
would  mark  the  very  bottom  of  the  scale." 


After  the  anchoring  performances  were  shown,  the  following 
Instructions  were  given: 

"Now  will  you  please  write  your  name  In  the 
upper  left-hand  corner  of  each  of  the  three  rating 
sheets.  You  will  be  asked  to  rate  the  perforaances 
of  three  operators.  Each  operator  will  perform  the 
task  five  times.  Each  of  the  five  trials  will  last 
about  one  minute.  At  the  end  of  the  fifth  trial  you 
will  rate  the  operator  on  his  total  performance  for 
all  five  runs.  Do  not  make  any  mark  on  the  rating 
sheet  until  after  the  fifth  run.  At  that  time  make 
your  rating  by  writing  the  operator’s  identification 
letter  (either  A,  B,  or  C)  in  the  appropriate  figure 
on  the  rating  sheet.  Remember  you  must  consider  his 
total  performance  on  all  five  runs  in  making  your 
ratings.  Now  here  is  the  first  operator  whose  per¬ 
formance  you  will  Judge,  Operator  _A .  Remember  he 
will  have  five  trials  on  the  task  and  then  you  will 
rate  his  total  performance  by  marking  an  "A"  in  the 
appropriate  figure.  Do  not  make  any  nark  on  your 
rating  sheet  until  after  the  fifth  trial,  and  do  not 
make  any  comments  to  your  neighbor." 

The  motion  picture  of  A  was  shown,  and  the  Judges  rated  his 
performance  at  the  end  of  the  fifth  trial.  The  rating  sheets  for 
A  were  collected  and  the  notion  picture  of  £  was  shown.  After  the 
Judges  rated  _B ,  the  rating  sheets  were  collected,  and  the  procedure 
was  repeated  for  C . 
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At  this  point  a  new  response  sheet  was  distributed  which  con¬ 
tained  the  special  box  shown  below: 


Fastest  Slowest 
Trial  Trial 

Operator  "A”  _  _ 

Operator  ”B"  _  _ 

Operator  "C"  _  _ 


The  Judges  were  told: 

"Write  you  name  in  the  upper  left-hand  corner 
of  this  new  sheet.  Now  try  to  remember  the  per¬ 
formances  you  have  seen  and  in  the  box  indicate 
which  of  the  five  trials  was  the  fastest  and  which 
was  the  slowest  for  each  operator.  If  you  cannot 
remember,  Just  make  a  guess,  but  in  any  case  write 
in  the  number  indicating  the  fastest  and  slowest 
trial  for  each  operator." 

The  purpose  of  having  the  Judges  recall  the  fastest  and  slowest 
trials  of  each  operator  was  to  obtain  a  quantitative  Indication  of 
whether  the  "unusual"  trial  of  .B  was  distinctive  enough  to  be  re¬ 
called  by  the  Judges 


Experimental  Variables 

Each  of  the  six  groups  of  Judges  went  through  the  same  procedure 
as  described  in  the  preceding  section.  All  conditions  were  the  same 
for  each  group  with  the  exception  of  the  performance  of  B.  A  re¬ 
presentation  of  the  experimental  design  is  shown  in  Figure  5.  The 
independent  variables  were  (1)  the  type  of  unusual  performance  that 
occurred  on  one  of  B's  five  trials  —  either  an  unusually  good  per¬ 
formance  or  an  unusually  poor  one,  and  (2)  the  ordinal  position  of 
the  unusual  trial  in  the  sequence  of  five  trials  —  either  the  first, 
third,  or  fifth  position.  All  groups  saw  exactly  the  same  perfor¬ 
mance  sequences  by  the  control  operators,  A  and  C.  The  performances 
rated  by  the  six  groups  of  Judges  are  shown  in  Table  I  and  in 
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Figures  3  and  4.  It  will  bs  noted  that  the  total  MRT  for  each  of 
the  three  operators  was  1.46  seconds  and  that  this  was  equivalent 
to  the  performance  of  the  "average"  anchoring  performer.  The  de¬ 
pendent  variable  was  the  mean  rating  given  B  by  each  of  the  six 
groups  of  Judges. 
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Table  I 


Schedule  of  Performances  (MRT'a)  Presented  to 
the  Six  Groups  of  Judges^ 


Group 


J. 

II 

III  IV  1 

Anchoring  performers 

"Average  operator" 

1 . 46 

"Fastest  operator" 

0 . 50 

(Same  as  Group  I) 

"Slowest  operator" 

2.75 

Operator  A 

VI 


1st  trial  1.36 


2nd 

It 

1 . 56 

3rd 

tt 

1  .  30 

(Same 

as  Group 

I) 

4  th 

It 

1 . 62 

5  th 

M 

•  1.43 

Total 

Perf ormanc  e 

1  .  46 

Operator 

B 

1st  trial 

(0.71) 

1 . 62 

1 . 62 

(2.27) 

1 . 30 

1 . 30 

2nd 

It 

1 . 66 

1 . 66 

1 . 66 

1.21 

1.21 

1 . 21 

3rd 

II 

1.62 

(0.71) 

1 .62 

1 . 30 

(2.27) 

1 . 29 

4th 

II 

1.69 

1 . 69 

1 . 69 

1 . 25 

1 . 25 

1 . 25 

5th 

II 

1 . 62 

1.62 

(0.71 ) 

1 . 29 

1 . 29 

(2.27) 

Total 

Performance 

1 . 46 

1  .  46 

1 . 46 

1,46 

1. 46 

1 . 46 

Operator 

C 

1st  trial 

1 . 53 

2nd 

II 

1.21 

3rd 

It 

1.67 

( Same 

as  Group 

I) 

4th 

II 

1.36 

5th 

It 

1.51 

Total 

Perf ormanc  e 

1 . 46 

XThe  MRT 

' s  in  this 

table  were 

derived 

f  rom  a 

frame-count  of 

the 

edited  movies,  where  one  frame  equals  1/24  second. 


HYPOTHESES 

Since  overall  performance  was  the  same  for  each  of  the  three 
operators  as  seen  by  each  of  the  six  groups  of  judges,  the  perform¬ 
ance  ratings,  if  completely  valid,  should  be  the  same  (and  should, 
in  fact,  be  equivalent  to  the  middle  point  on  the  rating  scale). 


13 


If  tine-order  effecte  are  present,  however,  the  ratings  of  A  and  C 
should  be  the  sane  for  all  groups,  but  there  should  be  significant 
differences  between  the  ratings  of  .B  by  the  six  groups  of  Judges. 
These  differences  should  appear  as  a  significant  Interaction  between 
the  two  independent  variables,  l.e.,  between  the  type  of  deviant 
perfornance  by  B  and  the  ordinal  position  of  that  perfornance.  The 
direction  of  the  Interaction  would  Indicate  the  particular  type  of 
tine-order  effect  that  is  present.  The  specific  predictions  associ¬ 
ated  with  the  various  effects  described  on  page  2  are  the  following: 

Prinacv  Effect 

If  there  is  a  prlnacy  effect  the  initial  trial  will  receive 
a  greater  weight  than  later  trials  in  deternining  the  overall  rating. 
In  the  case  of  the  control  operators,  A  and  C ,  this  would  not  pro¬ 
duce  any  significant  differences  anong  the  nean  ratings  of  the  six 
groups  of  Judges  because  these  operators  performed  at  a  relatively 
constant  level  throughout.  Therefore  the  prediction  regarding  the 
ratings  of  A  and  C  would  be  the  following  (where  the  letter  indicates 
the  operator  rated  and  the  subscript  indicates  the  group  of  Judges 
who  rated  him. ) 

A I  =  An  =  Am  =  Ajy  =  Ay  =  Ayi 
and  Cj  -  C  j  j  =  Cjjj  =  Cjy  -  Cy  =  Cyi 

But  a  primacy  effect  would  produce  differences  in  the  ratings  of  B. 
Group  I  saw  _B  perform  unusually  well  on  the  first  trial  and  Group 
IV  saw  him  perform  unusually  poorly.  If  the  first-trial  performances 
were  overly  weighted  the  following  effects  on  performance  ratings 
would  occur: 

B|>Bjj  B  1 1 1  and/or  Bjy<By  —  Byj 

Recency  Effect 

If  there  is  a  recency  effect,  the  fifth  trial,  since  it  was 
the  most  recent  one,  will  be  overly  weighted  in  determining  the 
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performance  rating.  Again  this  should  have  no  effect  on  ratings 
of  A  and  C,  but  should  affect  the  rating  of  _B .  A  recency  effect 
would  produce  differences  among  the  groups  of  Judges  that  would  be 
opposite  to  those  predicted  from  a  primacy  effect.  If  the  last- 
trial  performances  were  overly  weighted  the  following  effects  on 
performance  ratings  would  occur: 

B  j  B  j  i  ^  ®III  and/or  Bjy  ^ 

Terminal  Effect 

If  the  terminal  trials  (first  and  fifth)  are  weighted  more 
than  the  middle  trials  (second,  third,  and  fourth),  there  will  agal 
be  no  differences  among  the  ratings  of  A  and  C,  but  predictable 
differences  among  the  ratings  of  B.  If  the  first-trial  and  last- 
trial  performances  were  overly  weighted  the  following  effects  on 
performance  ratings  would  occur: 

Bj>-  Bjj  ,<.B|jj  and/or  ®IV4C®V^®VI 

Contrast  and  Assimilation  Effects 

If  there  are  contrast  or  assimilation  effects,  they  will  be 
reflected  by  a  significant  main  effect  of  the  first  Independent 
variable — the  type  of  deviant  performance  by  B — on  ratings  of  his 
performance.  The  null  hypothesis  is:  Bi  +  ll+m  “  BIV+V+VI  =  0. 
That  is,  there  would  be  no  significant  difference  between  the  mean 
ratings  of  B  by  the  three  groups  who  saw  him  perform  unusually  well 
on  one  occasion  (Groups  I,  II,  and  III)  and  the  three  groups  who 
saw  him  perform  poorly  on  one  occasion  (Groups  IV,  V,  and  VI).  If, 
however,  the  unusual  performance  were  given  an  unduly  high  weight 
(contrast  effect),  the  first  three  groups  would  rate  j)  higher  than 
the  last  three  groups.  And  If  the  unusual  performance  were  given 
an  unduly  low  weight  (assimilated  to  the  plateau  level),  the 
opposite  result  would  occur. 


SECONDARY  EXPERIMENT 


To  teat  the  contrait  and  assimilation  hypotheses.  It  was 
necessary  to  assume  that  the  plateau  level  of  B's  performance  (l.e., 
the  MRT  for  the  four  remaining  trials  when  the  deviant  trial  la 
excluded)  as  seen  by  the  first  three  groups  was  discrlmlnably  dif¬ 
ferent  from  the  plateau  level  as  seen  by  the  last  three  groups. 

To  test  this  assumption,  the  remaining  83  judges  were  divided  Into 
two  groups  (N  =  41,  42)  and  were  shown  the  demonstration  film  and 
the  anchoring  performances.  They  then  rated  the  performance  of  B 
on  two  occasions.  On  one  occasion  they  saw  only  the  four  trials 
that  represented  £' s  plateau  level  as  seen  by  Groups  I,  II,  and  III 
and  where  total  MRT  =  1.65  seconds.  On  the  other  occasion  they 
saw  only  the  four  trials  that  represented  B's  plateau  level  as  seen 
by  Groups  IV,  V,  and  VI  and  where  total  MRT  =  1.26  seconds.  The 
order  of  rating  was  counterbalanced  between  the  two  groups  of  Judges. 

RESULTS 


Preliminary  Analyses 

The  first  concern  of  the  data  analysis  was  whether  or  not  the 
parametric  statistics  could  be  used  In  testing  the  hypotheses. 

Simple  numerical  values  were  assigned  to  each  of  the  25  points  on 
the  rating  scale,  ranging  from  25  for  the  highest  point  on  the  scale 
to  1  for  the  lowest  point.  An  Inspection  of  the  distributions  of 
ratings  of  each  operator  by  each  group  of  judges  showed  that  they 
were  normal  In  shape,  and  by  Hartley's  test  (Winer,  1962)  homogeneous 
In  variance,  thereby  satisfying  the  major  requirements  for  parametric 
statistics . 

The  next  preliminary  analysis  was  performed  to  test  the  as¬ 
sumption  that  the  "unusual"  trial  by  £  was  noticeably  unusual  to  the 
judges.  Specifically,  could  Groups  I,  II,  and  III  correctly 
identify  .B's  fastest  trial,  and  could  Groups  IV,  V,  and  VI  correctly 
identify  his  slowest  trial?  Figure  6  shows  the  percentages  of  judges 
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in  the  first  thrss  groups  who  sslsctsd  sach  of  ths  flvs  trials  as 
bsing  ths  "fastest"  of  ths  sequence.  Figure  7  shows  ths  percentages 
of  Judges  in  the  last  three  groups  who  selected  each  of  the  five 
trials  as  being  the  "slowest"  of  the  sequence.  In  every  instance 
the  highest  percentage  occurred  for  the  correct  trial.  Although 
there  was  always  a  plurality  of  correct  Judgaents,  there  was  not 
always  a  majority  of  correct  Judgments.  It  appeared  that  the 
ordinal  position  of  an  unusually  good  performance  was  more  accu¬ 
rately  recalled  than  the  ordinal  position  of  an  unusually  poor 
performance.  The  good  performance  appeared  to  be  more  noticeable 
when  it  occurred  at  either  the  beginning  or  the  end  of  the  sequence 
than  when  it  occurred  in  the  middle,  while  the  poor  performance  was 
about  equally  well  recalled  whether  it  occurred  at  the  beginning, 
middle,  or  end  of  the  sequence.  Performance  on  the  five  trials  by 
_A  and  £  were  evidently  perceived  as  being  homogeneous,  since  the 
Judges  in  all  groups  gave  Judgments  of  "fastest  trial"  and  "slowest 
trial"  about  equally  often  to  each  of  the  five  trials. 

The  final  preliminary  analysis  concerned  the  d 1  sc  rial nabi 1  it y 
of  the  plateau  performance  levels  of  E>  as  seen  by  the  first  and 
last  three  groups  of  judges.  The  ratings  obtained  in  the  secondary 
experiment  were  used  to  test  the  assumption  that  judges  could 
accurately  discriminate  the  difference  between  a  plateau  performance 
level  of  MRT  =  1.6S  seconds  and  a  plateau  performance  level  of 
MRT  =  1.26  seconds.  The  results  showed  that  100%  of  the  83  Judges 
rated  B's  1. 65-second  mean  performance  as  being  poorer  than  his 
1. 26-second  mean  performance.  The  former  performance  was  given  a 
mean  rating  of  10.1,  while  the  latter  was  given  a  mean  rating  of 
15.8  on  the  25-point  scale.  (Although  a  significance  test  of  the 
difference  between  mean  ratings  was  probably  superfluous,  it  should 
be  recorded  that  _t  =  16.3).  Clearly,  the  two  performance  plateaus 
were  discrimlnably  different,  thereby  allowing  contrast  and  assimi¬ 
lation  effects  to  be  tested. 
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Time-order  Effects 


Th •  hypothesis  wbb  that  tine-order  effects,  If  present,  would 
produce  significant  differences  among  the  six  groups  of  Judges  in 
their  ratings  of  J  as  a  result  of  an  interaction  between  the  type 
of  deviant  performance  by  B  and  the  ordinal  position  of  that  per¬ 
formance.  No  such  differences  should  occur  among  the  ratings  of  A 
and  C.  This  hypothesis  was  tested  by  analyses  of  variance  of  the 
ratings  of  the  three  operators  by  the  six  groups  of  Judges. 

The  results  of  the  analyses  of  variance  are  shown  in  Tables  11, 
Ill,  and  IV.  As  predicted,  there  were  no  significant  effects  on  the 
ratings  given  the  control  operators  (Tables  II  and  IV),  but  there 
was  a  significant  interaction  (F  -  7.81,  p<.01)  between  the  two 
Independent  variables  affecting  the  ratings  given  j3.  This  inter¬ 
action  is  shown  graphically  In  Figure  8,  which  shows  the  mean  ratings 
of  ji's  performance  by  the  six  groups  of  Judges.  The  direction  of 
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Figure  8.  Mean  rating  of  B's  performance 
by  the  six  groups  of  Judges. 
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Table  II 


Analysis  of  Varianca  of  Ratings 
of  Oparator  A'a  Parforaanca 


Source 

SS 

df 

MS 

F 

A.  Typa  of  deviant  performance 
by  B.  (Good  va.  poor) 

1 . 2 

1 

1 . 2 

_  _ 

B.  Ordinal  position  of  deviant 
performance  by  j3  (1st,  3rd, 
or  5th) . 

23 . 8 

2 

11.0 

1 . 49 

A  X  B 

34.8 

2 

17. 4 

2.17 

Residual 

1872 . 2 

233 

8 . 0 

Total 

1032 

238 

Table 

1 1 1 

Analysis  of  Variance  of  Ratings 
of  Operator  B's  Performance 

Source 

as 

df 

MS 

F 

A.  Type  of 
(good  vs 

deviant  performance 
.  poor) . 

21.4 

1 

21 . 4 

1 . 23 

B.  Ordinal  position  of  deviant 
performance  (1st,  3rd,  or 

5th)  . 

77 . 6 

2 

38 . 8 

2.44 

A  X  B 

248 . 3 

2 

124.2 

7.81** 

Residual 

3702 . 7 

233 

15.9 

Total 

4250 

238 

Table 

IV 

Analysis  of  Variance  of  Ratings 
of  Operator  C’s  Performance 

Source 

SS 

df 

MS 

F 

A.  Type  of  deviant  performance 
by  B.  (Good  vs.  poor) 

2 . 4 

1 

2.4 

— ,  _ 

B.  Ordinal  position  of  deviant 
performance  by  .B  (1st,  3rd, 
or  5th). 

59. 4 

2 

29 . 7 

2. 18 

A  X  B 

37.6 

2 

18.8 

1.38 

Residual 

3106.0 

233 

13.0 

Total _ _ 

3266 

238 

**Sig.  .01  level. 
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the  interaction  and  subsequent  jt-tests  indicated  that  a  terminal 
effect  occurred  when  JB  had  an  unusually  good  trial  ( B  j  >  B  j  j  <  B  j  j  j )  , 
and  a  primacy  effect  occurred  when  B  had  an  unusually  poor  trial 
<  B I V  <BV  =  BVI> 


Contrast  and  Assimilation  Effects 

There  was  no  significant  difference  between  the  ratings  given 
£  by  the  first  three  groups  of  Judges  and  the  ratings  given  by  the 
last  three  groups.  This  difference  was  tested  in  the  analysis  of 
variance  by  the  main  effect  of  the  first  Independent  variable,  good 
vs.  poor  deviant  performance,  on  the  ratings  of  j3’s  performance. 

As  shown  in  Table  III,  no  significant  effect  (F  =  1.23)  was  found. 


Differences  Between  Ratings  of  the  Three  Operators 


Since  by  objective  measurement  of  their  reaction  times  A,  B, 
and  C  were  equal  in  overall  performance  (see  Table  I),  it  was  of 
interest  to  note  whether  they  were  given  equal  performance  ratings 
by  the  Judges.  The  means  and  standard  deviations  of  the  performance 
ratings  given  the  three  operators  by  all  groups  of  Judges  combined 
(N  =  239)  are  shown  in  Figure  9. 

As  indicated  in  Table  V,  C  was 
rated  significantly  lower  than 
either  A  or  13,  but  there  was  no 
significant  difference  between 
the  mean  ratings  of  A  and  j3. 

Further,  variability  of  ratlngB 
was  significantly  less  for  A 
than  it  was  for  either  13  or  £.. 


Figure  -.  Means  (horizontal  bars)  and 
standard  deviations  (vertical  bars)  of 
performance  ratings  of  the  three  oper¬ 
ators  by  all  judges  combined  (N=239). 
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Tabl •  V 


Significance  of  Differences  Between 
Means  and  Standard  Deviations  of  Performance  Ratings 
of  Operators  A,  B,  and  C,  by  All  Judges  Combined 


Operator 

A 

B 

C 

M  13  .  5 

13.8 

11.3 

a  2.9 

4.2 

3.8 

Difference  Tested 

t 

P 

Ma  -  Mb 

0 . 76 

-- 

Ma  -  Mc 

7 . 01 

.  001 

Mb  -  Mc 

6 . 80 

.  001 

CTa  ~  Ob 

5 . 53 

.  001 

era  ~  Oc 

4.07 

.  001 

Ob  -  Oc 

1  .  55 

CONCLUSIONS  AND  IMPLICATIONS 

It  tai  concluded  from  the  results  of  the  experiment  that: 

1.  An  unusually  good  performance  in  a  sequence  was 
weighted  more  heavily  by  Judges  in  determining  the  rating  of  the 
entire  sequence  when  that  performance  occurred  either  at  the  begin¬ 
ning  or  end  of  the  sequence  than  when  it  occurred  in  the  middle  of 
the  sequence.  This  was  called  a  "terminal  effect." 

2.  An  unusually  poor  performance  in  a  sequence  was 
weighted  more  heavily  by  Judges  in  determining  the  rating  of  the 
entire  sequence  when  that  performance  occurred  at  the  beginning  of 
the  sequence  than  when  it  occurred  later  in  the  sequence.  This  was 
called  a  "primacy  effect." 

3.  When  the  ordinal  position  of  an  unusual  performance 
was  disregarded,  the  unusual  performance  was  given  neither  more  nor 


less  than  lta  due  weight  by  Judges  In  determining  the  overall  per¬ 
formance  rating.  That  la,  there  were  no  significant  contrast  or 
assimilation  effects. 

4.  Even  though  performances  were  objectively  equal, 
Judges  in  this  experiment  gave  different  ratings  to  different 
operators . 

The  first  two  conclusions  refer  to  the  time-order  effects 
observed  under  the  conditions  of  this  experiment.  It  is  noted  that 
a  "terminal  effect"  is  simply  a  combination  of  both  primacy  and 
recency  effects.  Therefore,  the  results  showed  both  primacy  and 
recency  effects  when  an  unusually  good  performance  occurred  in  the 
sequence,  and  only  a  primacy  effect  when  an  unusually  poor  perform¬ 
ance  occurred.  It  might  be  concluded  then,  that  a  primacy  effect 
consistently  occurred,  while  a  recency  effect  occurred  only  in  the 
presence  of  an  unusually  good  performance.  This  suggests  the 
possibility  that  a  primacy  effect  is  a  general  phenomenon  in  per¬ 
formance  Judgment,  while  a  recency  effect  is  specific  to  unusually 
good  performance 

It  is  possible  that  the  terminal  effect  occurred  because  the 
unusually  good  performance  was  more  noticeable  when  it  occurred  at 
the  beginning  or  end  of  the  sequence  than  when  it  occurred  in  the 
middle.  The  recall  data  presented  in  Figure  6  would  support  such 
an  explanation.  Since  a  primacy  effect  occurred  when  an  unusually 
poor  performance  was  present,  it  would  be  expected  that  recall  of 
the  unusually  poor  performance  would  be  better  when  it  occurred  on 
the  first  trial  than  when  it  occurred  on  later  trials.  But,  the 
recall  data  presented  in  Figure  7  show  that  the  unusually  poor 
performance  was  recalled  equally  well  whether  it  occurred  first, 
third,  or  fifth  in  the  sequence.  Therefore,  the  time-order 
effects  that  were  observed  in  this  experiment  cannot  consistently 
be  attributed  to  the  " no t i c eabll 1 t y"  of  the  unusual  performances. 

The  consistent  occurrence  of  a  primacy  effect  in  the  present 
experiment  seems  to  emphasize  the  importance  of  "first  impressions" 
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In  determining  performanca  ratings.  Tha  man  who  bunglsa  a  Job  at 
tha  outaat  of  hia  tenura  may  contlnua  to  ba  glvsn  inappropriately 
low  ratings  avan  though  hia  performance  later  laprovaa.  And  in  a 
aiailar  manner,  tha  man  who  gives  a  strikingly  good  performance  whan 
ha  first  comes  under  tha  observation  of  raters  may  continue  to  ba 
given  inappropriately  high  ratings  avan  though  hia  performance  later 
deteriorates . 

It  is  interesting  to  note  that  the  last  man  rated,  operator  £, 
was  given  a  significantly  lower  mean  rating  than  either  of  the  first 
two  operators,  in  spite  of  the  fact  that  performances  were  objectively 
equal  for  all  three.  This  result  could  be  attributed  to  the  charac¬ 
teristics  of  the  man,  himself,  or  to  the  fact  that  he  was  the  last 
to  be  rated.  Although  the  experimental  design  did  not  allow  the 
assessment  of  the  separate  effects  of  individual  operators  and  the 
ordinal  position  of  an  operator,  it  seems  unlikely  that  the  lower 
rating  given  operator  C  was  attributable  to  his  personal  character¬ 
istics.  As  noted  earlier,  the  operators  were  of  similar  appearance 
and  the  motion  picture  films  were  deliberately  underexposed,  making 
it  difficult  for  the  Judges  to  perceive  physical  differences  among 
the  operators.  It  is  most  probable  that  the  significant  differences 
between  the  mean  performance  rating  given  C  and  those  given  A  and 
£  were  a  result  of  another  type  of  time-order  effect.  Thus,  the 
rating  given  an  individual  worker  may  depend  not  only  upon  the 
sequence  of  his  own  performances,  but  also  upon  his  ordinal  position 
among  the  other  workers  to  be  rated. 

Further  study  of  time-order  effects  in  the  context  of  perform¬ 
ance  judgment  should  be  directed  toward  the  study  of  the  following 
variables:  (1)  the  number  of  performances  observed  before  rating 

(i.e.,  rating  after  every  performance  vs.  rating  after  a  sequence 
of  performances);  (2)  the  duration  of  the  time  interval  between 
performances;  and  (3)  the  duration  of  the  time  interval  between 
the  last  performance  observed  and  the  time  at  which  the  rating  is 
made.  Until  the  influence  of  these  variables  becomes  known,  it 
may  not  be  possible  to  assess  accurately  the  validity  of  ratings 
of  human  performance. 
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